Making Sense of Search Results by Automatic Web-page Classifications

نویسنده

  • Ben Choi
چکیده

This paper reports the development of a system for automatically organizing Internet web pages into meaningful categories. The aim of the system is to allow Internet users to find useful information in less time. The current problem with using the Internet is how to find the information that we need. With the explosive growth in the Internet, the information overload situation is getting worse. The proposed system automatically classifies web pages based on three types of information: (1) The system analyzes organizational information among web pages (inter-web-page relationship), such as an URL and links within a web page. (2) It analyzes the meta-web-page information such as data contained in META tags and formatting data of a web page. And (3), it analyzes web-page-content information such as keywords and phrases in the content of a web page. Our results show that combining all three types of information provides better accuracy.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Cybergenre: Automatic Identification of Home Pages on the Web

The research reported in this paper is part of a larger project on the automatic classification of web pages by their genres. The long term goal is the incorporation of web page genre into the search process to improve the quality of the search results. In this phase, a neural net classifier was trained to distinguish home pages from non-home pages and to classify those home pages as personal h...

متن کامل

A New Hybrid Method for Web Pages Ranking in Search Engines

There are many algorithms for optimizing the search engine results, ranking takes place according to one or more parameters such as; Backward Links, Forward Links, Content, click through rate and etc. The quality and performance of these algorithms depend on the listed parameters. The ranking is one of the most important components of the search engine that represents the degree of the vitality...

متن کامل

Optimizing Membership Functions using Learning Automata for Fuzzy Association Rule Mining

The Transactions in web data often consist of quantitative data, suggesting that fuzzy set theory can be used to represent such data. The time spent by users on each web page is one type of web data, was regarded as a trapezoidal membership function (TMF) and can be used to evaluate user browsing behavior. The quality of mining fuzzy association rules depends on membership functions and since t...

متن کامل

Categorizing Search Result Records Using Word Sense Disambiguation

Web search engines are designed to retrieve and extract the information in the web databases and to return dynamic web pages. The Semantic Web is an extension of the current web in which it includes semantic content in web pages. The main goal of semantic web is to promote the quality of the current web by changing its contents into machine understandable form. Therefore, the milestone of seman...

متن کامل

Understanding Users Intent by Deducing Domain Knowledge Hidden in Web Search Query Keywords

Search Engines are used by people on a daily basis to retrieve information from the web. When an ambiguous word is present in a query, specific sense of the keyword is not considered during the search process. Search engines return a large amount of web pages as results from all the possible contexts. Users tend to browse only few pages. Improving quality of retrieved results is a challenge and...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001